Insights

Overall admission results


  • 81 applicants.
  • Accepted (~35%), rejected (~30%), and waitlisted (~35%).
    • Average acceptance among U.S.-based colleges in 2021 was 66%. Source.
  • Increased selectivity.

Admission results by gender


  • male 44, admit 15 + decline 16 + waitlist 13
  • female 34, admit 12 + decline 12 + waitlist 10

Applicants by geolocations (1)


Census Regions and Divisions

  • Unfeasible to check the data by states.
  • Aggregation by regions.
  • Three states are the major origins of applicants.

Applicants by geolocations (2)


  • No significant difference in quantitative measures of applicants.

Actionable Decisions

Decision Column 1

Geolocation

  • Due to:
    • Scarcity of applicant records.
    • Aggregation by regions.
  • Regions:
    • South (51%), West (46%)
    • Northeast (3%), Midwest (0%)

☑️ Recommend an overhaul of the College’s recruitment strategy, especially at it pertains to these areas.

Decision Column 2

Gender

Decision 2

  • Distinct gender difference.
  • Similar acceptance rate. No sexual discrimination present.
  • Room for improvement (12%).

☑️ Recommend the college takes a closer look at how it is marketing itself to and engaging with potential female applicants.

About

Left Column Text

The dataset contains 81 valid admission results from the CSV file SummerStudentAdmissions2.csv.

Three versions of this dataset are included on the right hand side:

  • Standardized data.
  • Cleaned data.
  • Raw data.

Due to the lack of information, some of the variables and contents from the dataset are interpreted intuitively.

In the cleaned dataset,

  • gender=-1 means the gender is undisclosed.
  • volunteer_level is ranked from 5 to 0.
  • gpa is calculated on a 4.0 scale.
  • writing_score should be on a 100 scale.
  • test_score has rather limited information.
  • work_exp’s unit is year.

The dashboard is powered by


The codes are open-sourced. Please feel free to star or fork this repository.

Star Fork

Right Column Table

Standardized data

Cleaned data

Raw data

---
title: "Admission Dashboard"
output: 
  flexdashboard::flex_dashboard:
    storyboard: true
    orientation: columns
    vertical_layout: fill
    social: ["twitter", "linkedin"]
    source_code: embed
    theme: bootstrap
    logo: static/logo.png
    favicon: static/favicon.png
    css: style.css
---

```{r setup-and-data-loading, include=FALSE}
gc()
rm(list = ls())

if (!require("pacman")) install.packages("pacman")
pacman::p_load(
  tidyverse, flexdashboard,
  here, styler, patchwork,
  hrbrthemes, ggthemes, ggtext, plotly,
  glue, waffle, DT, geofacet, ggbeeswarm,
  ggridges, treemapify
)

dat <- read_csv(here("data/data-cleaned.csv"))

style_file("index.Rmd")

# standardize all numeric variables

dat_stand <- dat |>
  mutate(
    decision = as_factor(decision),
    state = as_factor(state),
    gender = as_factor(gender),
    across(where(is.numeric), ~ round(scale(.)[, 1], 2)),
    partition = case_when(
      state %in% c("California", "Colorado", "Utah", "Oregon") ~ "west",
      state %in% c("Vermont", "New York") ~ "northeast",
      TRUE ~ "south"
    )
  )

dat_stand_long <- dat_stand |> pivot_longer(
  cols = c(
    gpa, work_exp, test_score,
    writing_score, volunteer_level
  ),
  names_to = "variable",
  values_to = "value"
)

dat_long <- dat |>
  mutate(partition = case_when(
    state %in% c("California", "Colorado", "Utah", "Oregon") ~ "west",
    state %in% c("Vermont", "New York") ~ "northeast",
    TRUE ~ "south"
  )) |>
  pivot_longer(
    cols = c(
      gpa, work_exp, test_score,
      writing_score, volunteer_level
    ),
    names_to = "variable",
    values_to = "value"
  )
```

Insights {.storyboard data-icon="fa-chart-line" data-commentary-width=200}
===================================== 


### **Overall admission results**

```{r, fig.width=8, fig.height=8}
dat |>
  count(decision) -> admission_summary

p1 <- ggplot(admission_summary, aes(fill = decision, values = n)) +
  geom_waffle(color = "white", size = 1.125, n_rows = 9, flip = TRUE) +
  scale_fill_manual(
    values = c("#1A6899", "#FC5449", "#FFCF58"),
    labels = c("Admit", "Decline", "Waitlist")
  ) +
  coord_equal() +
  theme_ipsum_rc() +
  theme(
    legend.position = "bottom",
    legend.title = element_text(color = "#000000"),
    legend.text = element_text(color = "#000000"),
    axis.title.y = element_text(
      family = "IBM Plex Sans", face = "bold"
    ),
    axis.title.x = element_text(
      family = "IBM Plex Sans", face = "bold"
    ),
    axis.text.x = element_blank(),
    axis.text.y = element_blank(),
    text = element_text(
      family = "IBM Plex Sans",
      color = "#3B372E"
    ),
    plot.title = element_text(hjust = 0.5),
    plot.subtitle = element_markdown(hjust = 0.5),
    plot.caption = element_markdown(),
    panel.grid.major = element_blank(),
    panel.grid.minor = element_blank()
  ) +
  labs(
    title = "SUMMER 2022 ADMISSION RESULTS",
    subtitle = "BAD DATA EXCLUDED.",
    caption = glue("SOURCE: SUMMERSTUDENTADMISSION2.CSV"),
    x = "",
    y = ""
  )

p1
```

***

- **81 applicants.**
- Accepted (~35%), rejected (~30%), and waitlisted (~35%).
  - Average acceptance among U.S.-based colleges in 2021 was 66%. [Source](https://www.collegedata.com/resources/the-facts-on-fit/understanding-college-selectivity).
- **Increased selectivity.**

### **Admission results by gender**

```{r, fig.width=12,fig.height=8}
p2 <- dat |>
  select(decision, gender) |>
  mutate(gender = as_factor(gender)) |>
  ggplot(aes(gender)) +
  geom_bar(aes(fill = decision),
    position = position_stack(reverse = TRUE),
    width = 0.2
  ) +
  scale_fill_manual(
    values = c("#1A6899", "#FC5449", "#FFCF58"),
    labels = c("Admit", "Decline", "Waitlist")
  ) +
  scale_x_discrete(labels = c("undisclosed", "female", "male")) +
  theme_ipsum_rc() +
  theme(
    legend.position = "bottom",
    legend.title = element_text(color = "#000000"),
    legend.text = element_text(color = "#000000"),
    axis.title.y = element_text(
      family = "IBM Plex Sans", face = "bold"
    ),
    axis.title.x = element_text(
      family = "IBM Plex Sans", face = "bold"
    ),
    axis.text.x = element_text(),
    axis.text.y = element_text(),
    text = element_text(
      family = "IBM Plex Sans",
      color = "#3B372E"
    ),
    plot.title = element_text(hjust = 0.5),
    plot.subtitle = element_markdown(hjust = 0.5),
    plot.caption = element_markdown(),
    panel.grid.major = element_blank(),
    panel.grid.minor = element_blank()
  ) +
  labs(
    title = "SUMMER 2022 ADMISSION RESULTS",
    subtitle = "BY GENDER",
    caption = glue("SOURCE: SUMMERSTUDENTADMISSION2.CSV"),
    x = "gender",
    y = "count"
  )

p2
```

*** 

- male 44, admit 15 + decline 16 + waitlist 13
- female 34, admit 12 + decline 12 + waitlist 10

### **Applicants by geolocations (1)**

```{r, fig.width=12,fig.height=8}
p3 <- dat_stand |>
  group_by(state, partition) |>
  summarize(count = n()) |>
  ggplot(aes(
    area = count, fill = partition,
    label = count, subgroup = state
  )) +
  geom_treemap() +
  geom_treemap_subgroup_border(color = "white", size = 2) +
  geom_treemap_subgroup_text(
    place = "centre", grow = TRUE,
    alpha = 0.5, colour = "white",
    fontface = "italic"
  ) +
  geom_treemap_text(
    color = "white", place = "bottomright",
    alpha = 0.7, fontface = "bold"
  ) +
  scale_fill_brewer(palette = "Set1") +
  theme_ipsum_rc() +
  theme(
    legend.position = "bottom",
    legend.title = element_text(color = "#000000"),
    legend.text = element_text(color = "#000000"),
    axis.title.y = element_text(
      family = "IBM Plex Sans", face = "bold"
    ),
    axis.title.x = element_text(
      family = "IBM Plex Sans", face = "bold"
    ),
    # axis.text.x = element_blank(),
    # axis.text.y = element_blank(),
    text = element_text(
      family = "IBM Plex Sans",
      color = "#3B372E"
    ),
    plot.title = element_text(hjust = 0.5),
    plot.subtitle = element_markdown(hjust = 0.5),
    plot.caption = element_markdown(),
    panel.grid.major = element_blank(),
    panel.grid.minor = element_blank()
  ) +
  labs(
    title = "TREEMAP OF APPLICANTS",
    subtitle = "SUMMER 2022",
    caption = glue("SOURCE: SUMMERSTUDENTADMISSION2.CSV"),
    x = "",
    y = ""
  )

p3
```

*** 

![Census Regions and Divisions](https://upload.wikimedia.org/wikipedia/commons/thumb/f/f1/Census_Regions_and_Division_of_the_United_States.svg/1280px-Census_Regions_and_Division_of_the_United_States.svg.png){width=100%}

- Unfeasible to check the data by states.
- Aggregation by **regions**.
- Three states are the major origins of applicants.

### **Applicants by geolocations (2)**

```{r, fig.width=12,fig.height=8}
p4 <- dat_long |>
  ggplot(aes(x = value, y = partition, color = partition, fill = partition)) +
  geom_density_ridges(alpha = 0.7) +
  scale_y_discrete(expand = c(0, 0)) + # will generally have to set the `expand` option
  scale_x_continuous(expand = c(0, 0)) + # for both axes to remove unneeded padding
  coord_cartesian(clip = "off") + # to avoid clipping of the very top of the top ridgeline +
  facet_wrap(~variable, scales = "free") +
  scale_fill_brewer(palette = "Set1") +
  scale_color_brewer(palette = "Set1") +
  theme_ipsum_rc() +
  theme(
    legend.position = "bottom",
    legend.title = element_text(color = "#000000"),
    legend.text = element_text(color = "#000000"),
    axis.title.y = element_text(
      family = "IBM Plex Sans", face = "bold"
    ),
    axis.title.x = element_text(
      family = "IBM Plex Sans", face = "bold"
    ),
    # axis.text.x = element_blank(),
    # axis.text.y = element_blank(),
    text = element_text(
      family = "IBM Plex Sans",
      color = "#3B372E"
    ),
    plot.title = element_text(hjust = 0.5),
    plot.subtitle = element_markdown(hjust = 0.5),
    plot.caption = element_markdown(),
    panel.grid.major = element_blank(),
    panel.grid.minor = element_blank()
  ) +
  labs(
    title = "RIDGEPLOT OF APPLICANTS' QUANTITATIVE MEASURES",
    subtitle = "SUMMER 2022",
    caption = glue("NOT ENOUGH NORTHEAST DATA FOR RIDGE PLOT.
SOURCE: SUMMERSTUDENTADMISSION2.CSV"), x = "candidates' standardized measurement", y = "density" ) p4 ``` *** - **No significant difference in quantitative measures of applicants.** Actionable Decisions {data-icon="fa-graduation-cap" data-orientation=columns} ===================================== Decision Column 1 ------------------------------------- ### Geolocation - Due to: - Scarcity of applicant records. - Aggregation by regions. - Regions: - South (51%), West (46%) - Northeast (3%), Midwest (0%) #### ☑️ Recommend an overhaul of the College's recruitment strategy, especially at it pertains to these areas. ```{r} # ggsave(filename = "static/decision2.png", plot = p2, width = 8, height = 6) # ggsave(filename = "static/decision1.png", plot = p3, width = 8, height = 6) ``` ![](static/decision1-edit.png){width=100%} Decision Column 2 ------------------------------------- ### Gender #### Decision 2 - **Distinct** gender difference. - Similar acceptance rate. **No sexual discrimination present.** - Room for improvement (12%). #### ☑️ Recommend the college takes a closer look at how it is marketing itself to and engaging with potential female applicants. ![](static/decision2-edit.png){width=100%} About {data-icon="fa-info" data-orientation=columns} ===================================== Left Column Text {data-width=350} ----------------------------------------------------------------------- The dataset contains 81 valid admission results from the CSV file `SummerStudentAdmissions2.csv`. Three versions of this dataset are included on the right hand side: - Standardized data. - Cleaned data. - Raw data. Due to the lack of information, some of the variables and contents from the dataset are interpreted intuitively. In the cleaned dataset, - `gender=-1` means the gender is undisclosed. - `volunteer_level` is ranked from 5 to 0. - `gpa` is calculated on a 4.0 scale. - `writing_score` should be on a 100 scale. - `test_score` has rather limited information. - `work_exp`'s unit is year. *** The dashboard is powered by - [`flexdashboard`](https://pkgs.rstudio.com/flexdashboard/) - [`DT`](https://rstudio.github.io/DT/) - [`plotly`](https://plotly.com/) - The static visualization theme is customized based on [`hrbrmstr`](https://github.com/hrbrmstr/hrbrthemes). *** The codes are open-sourced. **Please feel free to star or fork this repository.**

Star Fork

Right Column Table {.tabset data-width=650 data-height=1000} ----------------------------------------------------------------------- ### Standardized data ```{r} DT::datatable(dat_stand, options = list( bPaginate = FALSE ), style = "bootstrap" ) |> formatStyle( "decision", backgroundColor = styleEqual( c("Admit", "Decline", "Waitlist"), c("#1A6899", "#FC5449", "#FFCF58") ) ) |> formatStyle(c( "gpa", "work_exp", "test_score", "writing_score", "volunteer_level" ), background = styleColorBar(range(c( dat_stand$gpa, dat_stand$work_exp, dat_stand$test_score, dat_stand$writing_score, dat_stand$volunteer_level )), "lightblue"), backgroundSize = "98% 88%", backgroundRepeat = "no-repeat", backgroundPosition = "center" ) ``` ### Cleaned data ```{r} dat <- read_csv("data/data-cleaned.csv") DT::datatable(dat, options = list( bPaginate = FALSE ), style = "bootstrap" ) |> formatStyle( "decision", backgroundColor = styleEqual( c("Admit", "Decline", "Waitlist"), c("#1A6899", "#FC5449", "#FFCF58") ) ) ``` ### Raw data ```{r} DT::datatable(read_csv("data/SummerStudentAdmissions2.csv"), options = list( bPaginate = FALSE ), style = "bootstrap" ) ```